364 research outputs found
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
Visualizing Convolutional Networks for MRI-based Diagnosis of Alzheimer's Disease
Visualizing and interpreting convolutional neural networks (CNNs) is an
important task to increase trust in automatic medical decision making systems.
In this study, we train a 3D CNN to detect Alzheimer's disease based on
structural MRI scans of the brain. Then, we apply four different gradient-based
and occlusion-based visualization methods that explain the network's
classification decisions by highlighting relevant areas in the input image. We
compare the methods qualitatively and quantitatively. We find that all four
methods focus on brain regions known to be involved in Alzheimer's disease,
such as inferior and middle temporal gyrus. While the occlusion-based methods
focus more on specific regions, the gradient-based methods pick up distributed
relevance patterns. Additionally, we find that the distribution of relevance
varies across patients, with some having a stronger focus on the temporal lobe,
whereas for others more cortical areas are relevant. In summary, we show that
applying different visualization methods is important to understand the
decisions of a CNN, a step that is crucial to increase clinical impact and
trust in computer-based decision support systems.Comment: MLCN 201
Context-Aware Conversational Agents Using POMDPs and Agenda-Based Simulation
Proceedings of: Workshop on User-Centric Technologies and Applications (CONTEXTS 2011), Salamanca, April 6-8, 2011Context-aware systems in combination with mobile devices offer new opportunities in the areas of knowledge representation, natural language processing and intelligent information retrieval. Our vision is that natural spoken conversation with these devices can eventually become the preferred mode for managing their services by means of conversational agents. In this paper, we describe the application of POMDPs and agenda-based user simulation to learn optimal dialog policies for the dialog manager in a conversational agent. We have applied this approach to develop a statistical dialog manager for a conversational agent which acts as a voice logbook to collect home monitored data from patients suffering from diabetes.Funded by projects CICYT TIN2008-06742-C02-02/TSI, CICYT TEC2008-06732-C02-02/TEC, CAM CONTEXTS (S2009/TIC-1485), and DPS2008-07029-C02-02.Publicad
Learning Symbolic Models of Stochastic Domains
In this article, we work towards the goal of developing agents that can learn
to act in complex worlds. We develop a probabilistic, relational planning rule
representation that compactly models noisy, nondeterministic action effects,
and show how such rules can be effectively learned. Through experiments in
simple planning domains and a 3D simulated blocks world with realistic physics,
we demonstrate that this learning algorithm allows agents to effectively model
world dynamics
Competition in Social Networks: Emergence of a Scale-free Leadership Structure and Collective Efficiency
Using the minority game as a model for competition dynamics, we investigate
the effects of inter-agent communications on the global evolution of the
dynamics of a society characterized by competition for limited resources. The
agents communicate across a social network with small-world character that
forms the static substrate of a second network, the influence network, which is
dynamically coupled to the evolution of the game. The influence network is a
directed network, defined by the inter-agent communication links on the
substrate along which communicated information is acted upon. We show that the
influence network spontaneously develops hubs with a broad distribution of
in-degrees, defining a robust leadership structure that is scale-free.
Furthermore, in realistic parameter ranges, facilitated by information exchange
on the network, agents can generate a high degree of cooperation making the
collective almost maximally efficient.Comment: 4 pages, 2 postscript figures include
Self-Modification of Policy and Utility Function in Rational Agents
Any agent that is part of the environment it interacts with and has versatile
actuators (such as arms and fingers), will in principle have the ability to
self-modify -- for example by changing its own source code. As we continue to
create more and more intelligent agents, chances increase that they will learn
about this ability. The question is: will they want to use it? For example,
highly intelligent systems may find ways to change their goals to something
more easily achievable, thereby `escaping' the control of their designers. In
an important paper, Omohundro (2008) argued that goal preservation is a
fundamental drive of any intelligent system, since a goal is more likely to be
achieved if future versions of the agent strive towards the same goal. In this
paper, we formalise this argument in general reinforcement learning, and
explore situations where it fails. Our conclusion is that the self-modification
possibility is harmless if and only if the value function of the agent
anticipates the consequences of self-modifications and use the current utility
function when evaluating the future.Comment: Artificial General Intelligence (AGI) 201
Regression with Linear Factored Functions
Many applications that use empirically estimated functions face a curse of
dimensionality, because the integrals over most function classes must be
approximated by sampling. This paper introduces a novel regression-algorithm
that learns linear factored functions (LFF). This class of functions has
structural properties that allow to analytically solve certain integrals and to
calculate point-wise products. Applications like belief propagation and
reinforcement learning can exploit these properties to break the curse and
speed up computation. We derive a regularized greedy optimization scheme, that
learns factored basis functions during training. The novel regression algorithm
performs competitively to Gaussian processes on benchmark tasks, and the
learned LFF functions are with 4-9 factored basis functions on average very
compact.Comment: Under review as conference paper at ECML/PKDD 201
- …